18 research outputs found

    deepregression: A Flexible Neural Network Framework for Semi-Structured Deep Distributional Regression

    Get PDF
    In this paper we describe the implementation of semi-structured deep distributional regression, a flexible framework to learn conditional distributions based on the combination of additive regression models and deep networks. Our implementation encompasses (1) a modular neural network building system based on the deep learning library TensorFlow for the fusion of various statistical and deep learning approaches, (2) an orthogonalization cell to allow for an interpretable combination of different subnetworks, as well as (3) pre-processing steps necessary to set up such models. The software package allows to define models in a user-friendly manner via a formula interface that is inspired by classical statistical model frameworks such as mgcv. The package's modular design and functionality provides a unique resource for both scalable estimation of complex statistical models and the combination of approaches from deep learning and statistics. This allows for state-of-the-art predictive performance while simultaneously retaining the indispensable interpretability of classical statistical models

    Using interpretable machine learning to understand gene silencing dynamics during X-Chromosome inactivation

    Get PDF
    To equalize gene dosage between sexes, the long non-coding RNA Xist mediates chromosome-wide gene silencing of one X Chromosome in female mammals - a process known as X chromosome inactivation (XCI). The efficiency of gene silencing is highly variable across genes, with some genes even escaping XCI in somatic cells. A gene’s susceptibility to Xist-mediated silencing appears to be determined by a complex interplay of epigenetic and genomic features. However, the underlying rules remain poorly understood. To advance the understanding of Xist-mediated silencing pathways, chromosome-wide gene silencing dynamics at the level of nascent transcriptome were quantified using allele-specific Precision nuclear Run-On sequencing. We have developed a Random Forest machine learning model that is able to predict the measured silencing dynamics based on a large set of epigenetic and genomic features and tested its predictive power experimentally. We introduced a forest-guided clustering approach to uncover the combinatorial rules that control Xist-mediated gene silencing. Results suggest that the genomic distance to the Xist locus, followed by gene density and distance to LINE elements are the prime determinants of silencing velocity. Moreover, a series of features associated with active transcriptional elongation and chromatin 3D structure are enriched at efficiently silenced genes. Generally, silenced genes seem to be separated into two distinct groups, associated with different silencing pathways: one group that requires an AT-rich sequence context and the Xist repeat-A for silencing, which is known to activate the SPEN pathway, and another group where genes are pre-marked by polycomb complexes and tend to rely on the repeat-B in Xist for silencing, known to recruit polycomb complexes during XCI. Our machine learning approach can thus uncover the complex combinatorial rules underlying gene silencing during X chromosome inactivation.Eines der beiden X chromosome in weiblichen Säugetieren muss inaktiviert, um die Dosierung von X- Chromosomalen Genen zwischen den Geschlechtern auszugegleichen. Dieser Prozess wird X Chromosom Inaktivierung (XCI) genannt und wird maßgeblich von der langen nicht-kodierenden RNA Xist gesteuert. Die Inaktivierung von unterschiedlichen Genen erfolgt unterschiedlich schnell. Manche Gene sind sogar in der Lage der Inaktivierung zu entgehen und sind somit weiterhin in somatischen Zellen aktiv. Die Dynamiken mit denen Gene inaktiviert werden, werden durch ein komplexes Zusammenspiel von epige- netischen und genomischen Faktoren bestimmt. Dieses Zusammenspiel wurde bis jetzt jedoch noch nicht hinreichend untersucht um aussagekräftige Rückschlüsse zu ziehen. Für ein besseres Verständnis dieses Zusammenspiels, wurde mit Hilfe allel spezifischer Precision nuclear Run-On Sequenzierung die Inaktivie- rungsdynamik Chromosomen weit gemessen. Diese Messungen, wie auch eine Vielzahl von epigenetischen und genomischen Faktoren, haben uns in die Lage versetzt, mit Hilfe eines Random Forest Modells, Chro- mosomen weite Inaktivierungsdynamiken vorherzusagen, welche durch zusätzliche Experimente validiert werden konnten. Um zu analysieren welche Faktoren in diesem Prozess zusammenspielen, haben wir einen Random Forest-gestützten Clustering Ansatz implementiert. Die Ergebnisse legen nahe, dass der genomische Abstand zum Xist Genlocus, sowie die Gendichte und der Abstand zu LINE Elementen, die Hauptfaktoren für die Inaktivierungsgeschwindigkeit sind. Darüber hinaus wird eine Reihe von Faktoren, wie zum Beispiel die aktive Transkription oder die 3D Struktur des Chromatins, mit schneller Inaktivierung in Verbindung gebracht. Im Allgemeinen lassen sich inaktivierte Gene in zwei unterschiedliche Gruppen unterteilen, die mit unterschiedlichen Inaktivierungspfaden in Verbindung gebracht werden können. Die eine Gruppe benötigt einen AT-reichen Sequenz Kontext und das Xist Repeat-A Element, das welches den SPEN-Pfad aktiviert, während die andere Gruppe eine Anreicherung an Polycomb-Komplexen benötigt und auf das Xist Repeat-B Element zurückgreift, welches Polycomb-Komplexe während des XCI Prozesses rekrutiert. Diese Ergebnisse zeigen, dass unser Ansatz, basierend auf maschinellem Lernen, die komplexen kombinatorischen Regeln identifizieren kann, die der Inaktivierung von Genen während des XCI Prozesses zugrunde liegen

    Additional file 1 of Improved prediction of bacterial CRISPRi guide efficiency from depletion screens through mixed-effect machine learning and data integration

    No full text
    Additional file 1: Figure S1. Illustration of the genomic and sequence features used. Figure S2. Comparison of guide depletion across datasets. Figure S3. Spearman correlation of 10-fold cross-validation of models trained with one or mixed datasets. Figure S4. Data integration for retrained Pasteur and deep learning models. Figure S5. Interaction between distance features and whether targeting gene is the first gene in operon. Figure S6. Independent low-throughput validation of model performance. Figure S7. Additional figures related to the saturating screen of purine biosynthesis genes. Figure S8. Model performance of deep learning approaches. Supplementary Note: Deep learning approaches do not improve prediction performance

    Kinetics of Xist-induced gene silencing can be predicted from combinations of epigenetic and genomic features

    Get PDF
    To initiate X-Chromosome inactivation (XCI), the long noncoding RNA Xist mediates chromosome-wide gene silencing of one X Chromosome in female mammals to equalize gene dosage between the sexes. The efficiency of gene silencing is highly variable across genes, with some genes even escaping XCI in somatic cells. A gene's susceptibility to Xist-mediated silencing appears to be determined by a complex interplay of epigenetic and genomic features; however, the underlying rules remain poorly understood. We have quantified chromosome-wide gene silencing kinetics at the level of the nascent transcriptome using allele-specific Precision nuclear Run-On sequencing (PRO-seq). We have developed a Random Forest machine-learning model that can predict the measured silencing dynamics based on a large set of epigenetic and genomic features and tested its predictive power experimentally. The genomic distance to the Xist locus, followed by gene density and distance to LINE elements, are the prime determinants of the speed of gene silencing. Moreover, we find two distinct gene classes associated with different silencing pathways: a class that requires Xist-repeat A for silencing, which is known to activate the SPEN pathway, and a second class in which genes are premarked by Polycomb complexes and tend to rely on the B repeat in Xist for silencing, known to recruit Polycomb complexes during XCI. Moreover, a series of features associated with active transcriptional elongation and chromatin 3D structure are enriched at rapidly silenced genes. Our machine-learning approach can thus uncover the complex combinatorial rules underlying gene silencing during X inactivation

    Characterisation of microbial attack on archaeological bone

    Get PDF
    As part of an EU funded project to investigate the factors influencing bone preservation in the archaeological record, more than 250 bones from 41 archaeological sites in five countries spanning four climatic regions were studied for diagenetic alteration. Sites were selected to cover a range of environmental conditions and archaeological contexts. Microscopic and physical (mercury intrusion porosimetry) analyses of these bones revealed that the majority (68%) had suffered microbial attack. Furthermore, significant differences were found between animal and human bone in both the state of preservation and the type of microbial attack present. These differences in preservation might result from differences in early taphonomy of the bones. © 2003 Elsevier Science Ltd. All rights reserved

    Growing knowledge: an overview of Seed Plant diversity in Brazil

    No full text

    Highly-parallelized simulation of a pixelated LArTPC on a GPU

    No full text
    The rapid development of general-purpose computing on graphics processing units (GPGPU) is allowing the implementation of highly-parallelized Monte Carlo simulation chains for particle physics experiments. This technique is particularly suitable for the simulation of a pixelated charge readout for time projection chambers, given the large number of channels that this technology employs. Here we present the first implementation of a full microphysical simulator of a liquid argon time projection chamber (LArTPC) equipped with light readout and pixelated charge readout, developed for the DUNE Near Detector. The software is implemented with an end-to-end set of GPU-optimized algorithms. The algorithms have been written in Python and translated into CUDA kernels using Numba, a just-in-time compiler for a subset of Python and NumPy instructions. The GPU implementation achieves a speed up of four orders of magnitude compared with the equivalent CPU version. The simulation of the current induced on 10310^3 pixels takes around 1 ms on the GPU, compared with approximately 10 s on the CPU. The results of the simulation are compared against data from a pixel-readout LArTPC prototype

    Impact of cross-section uncertainties on supernova neutrino spectral parameter fitting in the Deep Underground Neutrino Experiment

    No full text
    International audienceA primary goal of the upcoming Deep Underground Neutrino Experiment (DUNE) is to measure the O(10)  MeV neutrinos produced by a Galactic core-collapse supernova if one should occur during the lifetime of the experiment. The liquid-argon-based detectors planned for DUNE are expected to be uniquely sensitive to the νe component of the supernova flux, enabling a wide variety of physics and astrophysics measurements. A key requirement for a correct interpretation of these measurements is a good understanding of the energy-dependent total cross section σ(Eν) for charged-current νe absorption on argon. In the context of a simulated extraction of supernova νe spectral parameters from a toy analysis, we investigate the impact of σ(Eν) modeling uncertainties on DUNE’s supernova neutrino physics sensitivity for the first time. We find that the currently large theoretical uncertainties on σ(Eν) must be substantially reduced before the νe flux parameters can be extracted reliably; in the absence of external constraints, a measurement of the integrated neutrino luminosity with less than 10% bias with DUNE requires σ(Eν) to be known to about 5%. The neutrino spectral shape parameters can be known to better than 10% for a 20% uncertainty on the cross-section scale, although they will be sensitive to uncertainties on the shape of σ(Eν). A direct measurement of low-energy νe-argon scattering would be invaluable for improving the theoretical precision to the needed level

    Health-status outcomes with invasive or conservative care in coronary disease

    No full text
    BACKGROUND In the ISCHEMIA trial, an invasive strategy with angiographic assessment and revascularization did not reduce clinical events among patients with stable ischemic heart disease and moderate or severe ischemia. A secondary objective of the trial was to assess angina-related health status among these patients. METHODS We assessed angina-related symptoms, function, and quality of life with the Seattle Angina Questionnaire (SAQ) at randomization, at months 1.5, 3, and 6, and every 6 months thereafter in participants who had been randomly assigned to an invasive treatment strategy (2295 participants) or a conservative strategy (2322). Mixed-effects cumulative probability models within a Bayesian framework were used to estimate differences between the treatment groups. The primary outcome of this health-status analysis was the SAQ summary score (scores range from 0 to 100, with higher scores indicating better health status). All analyses were performed in the overall population and according to baseline angina frequency. RESULTS At baseline, 35% of patients reported having no angina in the previous month. SAQ summary scores increased in both treatment groups, with increases at 3, 12, and 36 months that were 4.1 points (95% credible interval, 3.2 to 5.0), 4.2 points (95% credible interval, 3.3 to 5.1), and 2.9 points (95% credible interval, 2.2 to 3.7) higher with the invasive strategy than with the conservative strategy. Differences were larger among participants who had more frequent angina at baseline (8.5 vs. 0.1 points at 3 months and 5.3 vs. 1.2 points at 36 months among participants with daily or weekly angina as compared with no angina). CONCLUSIONS In the overall trial population with moderate or severe ischemia, which included 35% of participants without angina at baseline, patients randomly assigned to the invasive strategy had greater improvement in angina-related health status than those assigned to the conservative strategy. The modest mean differences favoring the invasive strategy in the overall group reflected minimal differences among asymptomatic patients and larger differences among patients who had had angina at baseline

    Initial invasive or conservative strategy for stable coronary disease

    No full text
    BACKGROUND Among patients with stable coronary disease and moderate or severe ischemia, whether clinical outcomes are better in those who receive an invasive intervention plus medical therapy than in those who receive medical therapy alone is uncertain. METHODS We randomly assigned 5179 patients with moderate or severe ischemia to an initial invasive strategy (angiography and revascularization when feasible) and medical therapy or to an initial conservative strategy of medical therapy alone and angiography if medical therapy failed. The primary outcome was a composite of death from cardiovascular causes, myocardial infarction, or hospitalization for unstable angina, heart failure, or resuscitated cardiac arrest. A key secondary outcome was death from cardiovascular causes or myocardial infarction. RESULTS Over a median of 3.2 years, 318 primary outcome events occurred in the invasive-strategy group and 352 occurred in the conservative-strategy group. At 6 months, the cumulative event rate was 5.3% in the invasive-strategy group and 3.4% in the conservative-strategy group (difference, 1.9 percentage points; 95% confidence interval [CI], 0.8 to 3.0); at 5 years, the cumulative event rate was 16.4% and 18.2%, respectively (difference, 121.8 percentage points; 95% CI, 124.7 to 1.0). Results were similar with respect to the key secondary outcome. The incidence of the primary outcome was sensitive to the definition of myocardial infarction; a secondary analysis yielded more procedural myocardial infarctions of uncertain clinical importance. There were 145 deaths in the invasive-strategy group and 144 deaths in the conservative-strategy group (hazard ratio, 1.05; 95% CI, 0.83 to 1.32). CONCLUSIONS Among patients with stable coronary disease and moderate or severe ischemia, we did not find evidence that an initial invasive strategy, as compared with an initial conservative strategy, reduced the risk of ischemic cardiovascular events or death from any cause over a median of 3.2 years. The trial findings were sensitive to the definition of myocardial infarction that was used
    corecore